Segmentation of Chinese Discourse in Content-Based Information Retrieval
نویسندگان
چکیده
In this paper, we present a novel approach in automatic discourse segmentation without a full semantic understanding. In order to analyse the textual bonds and determine the degree of coherence that a discourse may exhibit, we first represent the tremendous diversity of textual relations into a discourse network. A set of mutual linguistic constraints that largely determines the similarity of meaning among lexical items is encoded. Topic boundaries in a discourse are identified through a computational method which identifies the segment cluster from a higher order structure in the discourse network. Our segmentation is regarded as a process of identifying the shifts from one segment cluster to another. Experimental results show that our formulation is capable to address the topic shifts of texts. Comparison with a related method demonstrates that the combination of constraints is closely related to the topic boundaries among textual segments. Evaluation using recall and precision shows the effectiveness of our approach in a collection of Chinese newswire articles.
منابع مشابه
Topic Identification in Chinese Discourse Based on Centering Model
In this article we are concerned with identifying topics of utterances in texts, which are discourse elements reflecting the links between a sentence and its context. The information carried by the topics can be used to contribute to a number of natural language processing applications, such as information retrieval, text categorization and discourse segmentation etc. However, the phenomenon of...
متن کاملChinese Spam Filtering Based On Back-Propagation Neural Networks
As the email service is becoming an important communication way on the Network, the spam is increasing every day. This paper describes a new filtering model based on email content by using Back-Propagation Neural Networks (BPNN). And for the Chinese email, it uses Natural Language Processing & Information Retrieval Sharing Platform (NLPIR) system to perform Chinese word segmentation. The simula...
متن کاملProsody-based Topic Segmentation for Mandarin Broadcast News
Automatic topic segmentation, separation of a discourse stream into its constituent stories or topics, is a necessary preprocessing step for applications such as information retrieval, anaphora resolution, and summarization. While significant progress has been made in this area for text sources and for English audio sources, little work has been done in automatic, acoustic feature-based segment...
متن کاملAssessing Prosodic And Text Features For Segmentation Of Mandarin Broadcast News
Automatic topic segmentation, separation of a discourse stream into its constituent stories or topics, is a necessary preprocessing step for applications such as information retrieval, anaphora resolution, and summarization. While significant progress has been made in this area for text sources and for English audio sources, little work has been done in automatic segmentation of other languages...
متن کاملChinese Discourse Segmentation Based on Punctuation Marks
This paper addresses Chinese discourse segmentation based on punctuation mark. Particularly, we propose various kinds of lexical, syntactic, position and punctuation features to train classifiers for Chinese discourse segmentation. Experimental results on CDTB (Chinese Discourse Treebank) show that our method based on punctuation mark is appropriate for Chinese discourse segmentation with 89.2%...
متن کامل